Continuity of the Value of Competitive Markov Decision Processes
نویسنده
چکیده
A Markov Decision Process (MDP) is given by (i) a finite set of states S and an initial state s1 ¥ S, (ii) a finite set of actions A, (iii) a cost function c: S×AQ R, and (iv) a transition rule p: S×AQ D(S), where D(S) is the space of probability distributions over S. At every stage n ¥ N, where N is the set of positive integers, the process is in some state sn ¥ S. The decision maker chooses an action an ¥ A, and a new state sn+1 ¥ S is chosen according to p( · | sn, an). It is assumed that the decision maker remembers the sequence of states the process visited and his past actions. Denote by H=1n ¥ N (S×A)×S the set of all finite histories, where by convention, B=” for every finite set B and we identify ”×S with S. A plan of the decision maker is a function s which assigns to every finite
منابع مشابه
Accelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملOptimal Control of Piecewise Deterministic Markov Processes with Finite Time Horizon
In this paper we study controlled Piecewise Deterministic Markov Processes with finite time horizon and unbounded rewards. Using an embedding procedure we reduce these problems to discrete-time Markov Decision Processes. Under some continuity and compactness conditions we establish the existence of an optimal policy and show that the value function is the unique solution of the Bellman equation...
متن کاملCountable State Markov Decision Processes with Unbounded Jump Rates and Discounted Cost: Optimality Equation and Approximations
This paper considers Markov decision processes (MDPs) with unbounded rates, as a function of state. We are especially interested in studying structural properties of optimal policies and the value function. A common method to derive such properties is by value iteration applied to the uniformised MDP. However, due to the unboundedness of the rates, uniformisation is not possible, and so value i...
متن کاملConditional Value-at-Risk Minimization in Finite State Markov Decision Processes: Continuity and Compactness
This study is concerned with the dynamic risk-analysis for finite state Markov decision processes. As a measure of risk, we consider conditional value-at-risk(CVaR) for the real value of the discounted total reward from a policy, under whose criterion risk optimal or deterministic policies are defined. The risk problem is equivalently redefined as a non-linear optimization problem on the attain...
متن کاملOn terminating Markov decision processes with a risk-averse objective function
We consider a class of undiscounted terminating Markov decision processes with a risk-averse exponential objective function and compact constraint sets. After assuming the existence of an absorbing cost-free terminal state , positive transition costs away from , and continuity of the transition probability and cost functions, we establish (i) the existence of a real-valued optimal cost function...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003